Using Directed Graph Based BDMM Algorithm for Chinese Word Segmentation
نویسندگان
چکیده
Word segmentation is a key problem for Chinese text analysis. In this paper, with the consideration of both word-coverage rate and sentencecoverage rate, based on the classic Bi-Directed Maximum Match (BDMM) segmentation method, a character Directed Graph with ambiguity mark is designed for searching multiple possible segmentation sequences. This method is compared with the classic Maximum Match algorithm and Omni-segmentation algorithm. The experiment result shows that Directed Graph based BDMM algorithm can achieve higher coverage rate and lower complexity.
منابع مشابه
A MMSM-based Hybrid Method for Chinese MicroBlog Word Segmentation
After years of researches, Chinese word segmentation has achieved quite high precisions for formal style text. However, the performance of segmentation is not so satisfying for MicroBlog corpora. In this paper we describe a scheme for Chinese word segmentation for, MicroBlog which integrates the characterbased and word-based information in the directed graph generated by MMSM model. Word-level ...
متن کاملHMM and CRF Based Hybrid Model for Chinese Lexical Analysis
This paper presents the Chinese lexical analysis systems developed by Natural Language Processing Laboratory at Dalian University of Technology, which were evaluated in the 4th International Chinese Language Processing Bakeoff. The HMM and CRF hybrid model, which combines character-based model with word-based model in a directed graph, is adopted in system developing. Both the closed and open t...
متن کاملToward Better Chinese Word Segmentation for SMT via Bilingual Constraints
This study investigates on building a better Chinese word segmentation model for statistical machine translation. It aims at leveraging word boundary information, automatically learned by bilingual character-based alignments, to induce a preferable segmentation model. We propose dealing with the induced word boundaries as soft constraints to bias the continuous learning of a supervised CRFs mod...
متن کاملEffective Subsequence-based Tagging for Chinese Word Segmentation
Effective Subsequence-based Tagging for Chinese Word Segmentation Hai Zhao, Chunyu Kit (1. Department of Chinese, Translation and Linguistics, City University of Hong Kong, 83 Tat Avenue, Kowloon, Hong Kong SAR, China) Abstract: The research of automatic Chinese word segmentation has been advancing rapidly in recent years, especially since the First International Chinese Word Segmentation Bakeo...
متن کاملDAG-based Long Short-Term Memory for Neural Word Segmentation
Neural word segmentation has attracted more and more research interests for its ability to alleviate the effort of feature engineering and utilize the external resource by the pre-trained character or word embeddings. In this paper, we propose a new neural model to incorporate the wordlevel information for Chinese word segmentation. Unlike the previous wordbased models, our model still adopts t...
متن کامل